An empirical comparison of min-max-modular k -NN with different voting methods to large-scale text categorization
نویسندگان
چکیده
Text categorization refers to the task of assigning the pre-defined classes to text documents based on their content. k-NN algorithm is one of top performing classifiers on text data. However, there is little research work on the use of different voting methods over text data. Also, when a huge number of training data is available online, the response speed slows down, since a test document has to obtain the distancewith each training data.On the other hand, min–max-modular k-NN (M3-k-NN) has been applied to large-scale text categorization. M3-k-NN achieves a good performance and has faster response speed in a parallel computing environment. In this paper, we investigate five different voting methods for k-NN and M3-k-NN. The experimental results and analysis show that the Gaussian voting method can achieve the best performance among all voting methods for both k-NN and M3-k-NN. In addition, The work of K. Wu and B. L. Lu was supported in part by the National Natural Science Foundation of China under the grants NSFC 60375022 and NSFC 60473040, and the Microsoft Laboratory for Intelligent Computing and Intelligent Systems of Shanghai Jiao Tong University. K. Wu · B.-L. Lu (B) Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai, 200240, China e-mail: [email protected] B.-L. Lu e-mail: [email protected] M. Utiyama · H. Isahara Knowledge Creating Communication Research Center, National Institute of Information andCommunications Technology, 3-5 Hilaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan e-mail: [email protected] H. Isahara e-mail: [email protected] M3-k-NN uses less k-value to achieve the better performance than k-NN, and thus is faster than k-NN in a parallel computing environment.
منابع مشابه
A Modular k-Nearest Neighbor Classification Method for Massively Parallel Text Categorization
This paper presents a Min-Max modular k-nearest neighbor (M-k-NN) classification method for massively parallel text categorization. The basic idea behind the method is to decompose a large-scale text categorization problem into a number of smaller two-class subproblems and combine all of the individual modular k-NN classifiers trained on the smaller two-class subproblems into an M-k-NN classifi...
متن کاملAn Empirical Comparison of Text Categorization Methods
In this paper we present a comprehensive comparison of the performance of a number of text categorization methods in two different data sets. In particular, we evaluate the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of the Vector and LSA models. We report the results obtained using the Mean Recipro...
متن کاملUsing the feature projection technique based on a normalized voting method for text classification
This paper proposes a new approach for text categorization, based on a feature projection technique. In our approach, training data are represented as the projections of training documents on each feature. The voting for a classification is processed on the basis of individual feature projections. The final classification of test documents is determined by a majority voting from the individual ...
متن کاملText Categorization using Feature Projections
This paper proposes a new approach for text categorization, based on a feature projection technique. In our approach, training data are represented as the projections of training documents on each feature. The voting for a classification is processed on the basis of individual feature projections. The final classification of test documents is determined by a majority voting from the individual ...
متن کاملApplication of k Nearest Neighbor on Feature Projections Classi er to Text Categorization
This paper presents the results of the application of an instance based learning algorithm k Nearest Neighbor Method on Fea ture Projections k NNFP to text categorization and compares it with k Nearest Neighbor Classi er k NN k NNFP is similar to k NN ex cept it nds the nearest neighbors according to each feature separately Then it combines these predictions using a majority voting This prop er...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Soft Comput.
دوره 12 شماره
صفحات -
تاریخ انتشار 2008